An Audio-Visual Separation Model Integrating Dual-channel Attention Mechanism

نویسندگان

چکیده

Sound source separation is the of targeted sounds from a noisy environment, which plays an important role in signal processing and has been studied extensively. However, most these researches only extract audio information for processing, ignoring visual information, resulting waste feature information. In addition, some researchers have fused extracted features but not noticed weights different features, poor model effect. This paper uses multi-modal to separate sound sources solve problems. We constructed Audio-Visual integrating Dual-channel Attention mechanism named AVDA. The realizes through dynamic fusion features. Specifically, firstly, we take video as input data preprocessing segment it obtain frames audio. Then they are introduced into extractors, integrate attention Finally, prediction component predicted spectrogram, compared with ground truth spectrogram subjective evaluation. three indexes distortion ratio (SDR), interference (SIR) artifact (SAR) quantitative comparison. experimental results on MUSIC-21 set show that achieves 10.96, 17.91 12.77 respectively indexes, its performance significantly better than other models audio-visual tasks.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using audio and visual information for single channel speaker separation

This work proposes a method to exploit both audio and visual speech information to extract a target speaker from a mixture of competing speakers. The work begins by taking an effective audio-only method of speaker separation, namely the soft mask method, and modifying its operation to allow visual speech information to improve the separation process. The audio input is taken from a single chann...

متن کامل

Speaker separation using visual speech features and single-channel audio

This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speaker’s speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter...

متن کامل

Single Channel Audio Source Separation

-Blind source separation is an advanced statistical tool that has found widespread use in many signal processing applications. However, the crux topic based on one channel audio source separation has not fully developed to enable its way to laboratory implementation. The main idea approach to single channel blind source separation is based on exploiting the inherent time structure of sources kn...

متن کامل

Audio-visual integration during overt visual attention

How do different sources of information arising from different modalities interact to control where we look? To answer this question with respect to real-world operational conditions we presented natural images and spatially localized sounds in (V)isual, Audiovisual (AV) and (A)uditory conditions and measured subjects' eye-movements. Our results demonstrate that eye-movements in AV conditions a...

متن کامل

Anthropomorphic Agent as an Integrating Platform of Audio-Visual Information

One of ultimate human-machine interfaces is anthropomorphic spoken dialog agent which behaves like humans with facial animation and gesture and make speech conversations with humans. Among numerous efforts devoted for such a goal, Galatea Project conducted by 17 members from 12 universities is developing an open-source license-free software toolkit [1] for building an anthropomorphic spoken dia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3287860